This supplementary information describes the methods of the behavioural studies that were carried out in English and Swedish.

Project overview

This data presented in this report comes from two projects, one conducted in English (Xin’s data) and one in Swedish (Maryann’s data). Henceforth, we use native_language (English vs. Swedish) as a variable to distinguish between the two datasets. In each case, we compare perceptual adaptation effects in two Group (/d/-exposure vs. control). Each experiment consists of an exposure phase followed by a test phase. Here we analyze the test phase data in order to compare human responses to predictions of Ideal Observers.

The Ideal Observers are constructed for each project. Categorization predictions are made on the test data from a) native language model (trained on tokens from a native speaker of the target language; English or Swedish); b) non-native talker-specific model (trained on tokens from the test speaker; native-Flemish or Mandarin). Henceforth, we use modelType (native vs. non-native) to distinguish the two types of models.

In sum, the following data are analyzed and reported: 1) Xin’s perception experiment: Goodness rating task 2) Maryann’s perception experiment: Gooodness rating task 3) Ideal Observers predictions 4) Xin’s perception experiment: ID task from the Xie et al.(2017)

Results of perception experiment

We compare the results of the perception experiments on English and Swedish. We first compare lexical decision accuracy during exposure. Then we compare the effect of the exposure manipulation on ratings of /d/-goodness during test.

Lexical decision accuracy during exposure

Lexical decision accuracy during exposure was high across exposure groups and experiments. In particular, the fact that participants in the /d/-exposure group recognized the critical exposure words with syllable-final /d/ (and without minimal pair neighbors) justifies the assumption of lexically-guided adaptation made by the ideal observer we introduce below. Figures and summarize participants’ accuracy and reaction time for during exposure.

\label{fig:LD.overall}Lexical decision accuracies and RTs by language and group

Lexical decision accuracies and RTs by language and group

English

Response accuracy indicated that critical /d/-final words were largely judged to be real words by the experimental group

Swedish

Lexical decision accuracy was high across both the /d/-exposure (M = 96.4, SD = 2.6) and control group (M = 96.8, SD = 1.1). Accuracy was also high for critical words with syllable-final /d/ in the /d/-exposure group (M = 94.2, SD = 6). Notably, one participant in the /d/-exposure group was correct on less than 90% of the critical words, but even that participant categorized most critical words correctly (80%).

Reaction times were slower in the /d/-exposure group (control group: M = 0.6, SD = 0.2; /d/-exposure group: M = 0.9, SD = 0.4). The very high average reaction time for participant 101 in the /d/-exposure group was caused by a single outlier trial, on which the participant took 35 seconds to respond. The difference between exposure conditions does, however, persist if reaction times above 3.5 SDs away from a participant’s mean RT are excluded (control group: M = 0.6, SD = 0.2; /d/-exposure group: M = 0.8, SD = 0.4).

This suggests that the syllable-final /d/ of the Flemish-accented speaker was indeed accented, leading to additional difficulty beyond whatever other difficulty participants might have experienced while processing the foreign accent.

Swedish vs. English

To compare lexical decision accuracy across the two experiments, we conducted a mixed-effects logistic regression (Breslow &Clayton, 1993; Jaeger, 2008) over the combined data from both experiments. The regression predicted correct (1) vs. incorrect (0) answers based on the exposure group (sum-coded: 1 = /d/-exposure vs. -1 = control), the experiment (sum-coded: 1 = Swedish vs. -1 = English), and their interaction. The regression contained the maximal random effect structure justified by the design (random by-participant intercepts; random by-item intercepts and slopes for exposure group; item IDs did not overlap between experiments).

Ratings during test

We focus here on goodness ratings for test words ending in /d,t/ (unlike the English data, the Swedish data also contained ratings for words beginning with /d,t/; these are not analyzed here, though we mention for completeness’s sake that this later test block also did not reveal a significant effect of exposure group for the Swedish data).

Description of datasets

  • Swedish: 23 participants (11 experimental, 12 control); 30 word pairs each participant rated each word (one of the pair) either for goodness as /d/ or for goodness as /t/, resulting in 16 ratings per sound category (/d/ vs. /t/) per rating type (goodness as /d/ vs. goodness as /t/) and 64 ratings per participant. Total observations: 23 * 15 * 2 * 2 = 1380

  • English: 48 participants (24 experimental, 24 control); 60 word pairs each participant rated each word (one of the pair) both for goodness as /d/ and for goodness as /t/, resulting in 30 ratings per sound category (/d/ vs. /t/) per rating type (goodness as /d/ vs. goodness as /t/) and 120 ratings per participant. Total observations: 48 * 30 * 2 * 2 = 5760 observations

A note on data transformation

In the English experiment: for each token, there is one rating from goodness-as-/d/ task and one rating as goodness-as-/t/ task. In Maryann’s experiment: for each token, there is either a rating from goodness-as-/d/ task or a rating from goodness-as-/t/ task. To facilitate comparison and data interpretation, we are doing the following transformation:

-1) we transform all goodness ratings for /t/ into goodness-as-/d/ (if a token is rated 3 in goodness-as-/t/ task, it is tranformed as 5 to indicate goodness-as-/d/). This means that for the English data, each token has two goodness-as-/d/ ratings. Fig1 shows that the two goodness-as-/d/ ratings are correlated.

As a caveat, table1 shows that on average, the same goodness-as-/d/ for tokens can be somewhat different between the two tasks. The values show the goodness-as-/d/ after the transformation.

-2) With this caveat, for item-based analyses in relating to ideal observer predictions, the mean of the two ratings is used to indicate the /d/ goodness of tokens in the English data. This issue doesn’t apply to the Swedish data since there is only one rating per token. This does NOT affect analyses on the participant data.

-3) Just a sanity check: Fig2 shows that that item-wise responses from the ID task and the rating tasks are consistent (Xin’s data only).

## # A tibble: 8 x 4
## # Groups:   Group, Sound [4]
##   Group        Sound Rating.for mean_d
##   <chr>        <fct> <fct>       <dbl>
## 1 /d/-exposure d     d           0.497
## 2 /d/-exposure d     t           0.479
## 3 /d/-exposure t     d          -0.399
## 4 /d/-exposure t     t          -0.576
## 5 control      d     d           0.473
## 6 control      d     t           0.335
## 7 control      t     d          -0.256
## 8 control      t     t          -0.552

Comparing ID task and Rating task

To understand human responses, below we plot the item-level correlation between human responses from the ID task and the rating task available for the English dataset. For the rating task, the two original ratings (goodness as /d/ and goodness as /t/) were transformed into goodness-as-/d/ and averaged to yield a single metric.

English

Swedish

Swedish vs. English

Bar graph of human rating results from the two experiments: Swedish and English. It appears that the two experiments yielded different patterns: the English experiment showed a Group difference, whereas the Swedish experiment did not.

## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

Methods of perception experiments

We describe the methods used to derive the English (Xie, Theodore, & Myers, 2017) and Swedish data sets. Both experiments employed an exposure-test paradigm. Exposure was manipulated between participants. Both groups were exposed to foreign-accented speech from the same talker (Mandarin-accented English for the English data; Flemish-accented Swedish for the Swedish data). The two groups of participants differed, however, in whether the exposure materials contained information about the critical phonological category (syllable-final /d/), for which the foreign accent is known to deviate from native pronunciations. The control group never heard any instances of syllable-final /d/ or /t/. The /d/-exposure group heard words with syllable-final /d/, but no words with syllable-final /t/. Following exposure, both groups went through the exact same test phase, during which they made goodness judgments of /d/-/t/ tokens that were part of minimal pairs (e.g., “seed” or “seat”).

In addition to the L1-L2 language pairs (Mandarin-accented English vs. Flemish-accented Swedish) and participants’ L1 (American English vs. Swedish), the two experiments exhibited a number of differences that we detail next. This includes differences in i) the number of participants, ii) the instructions and visual appearance of the experiment, iii) the number of stimuli, and iv) minor differences in the stimuli design. As indicated below, some of these differences were intended, some were not. For each difference, we consider whether it is likely to explain the difference in results, and find this to be unlikely—at least, compared to the possibility explored in this study that it is primarily the phonetic properties of the foreign-accented speech and its relation to native listeners’ expectations that causes the difference in results.

Participants

English

48 monolingual English speakers participated in the experiment. They were assigned in equal numbers to the /d/-exposure and control groups.

Swedish

25 Swedish speakers recruited from the Department of Swedish & Multilingualism at Stockholm University participated. Two of the participants were excluded from the analysis because post-experiment surveys found that they were not native speakers of Swedish. Participants were alternately assigned to the /d/-exposure or control group.

Why the difference?

The decision to recruit a smaller number of participants was made because 1) the Swedish experiment was conceived as pilot experiment for a larger series of experiments still to be conducted, and 2) other previous studies had found significant effects with a similarly small number of participants (12 participants for each of two conditions in Eisner, Melinger, & Weber, 2013).

Are differences in the number of participants likely to explain the results?

To compare participants’ ratings for the /d,t/-final test words across the two experiments, we conducted a linear mixed-effects regression (Baayen et al., 2008) over the combined data from both experiments. The regression predicted /d/-goodness ratings (1-7) based on the sound category (sum-coded: 1 = word recording intended to end in /d/ vs. -1 = word recording intended to end in /t/), exposure group (sum-coded: 1 = /d/-exposure vs. -1 = control), the experiment (sum-coded: 1 = Swedish vs. -1 = English), and all their interactions. The regression contained the maximal random effect structure justified by the design (random by-participant intercepts and slope for sound; random by-item intercepts and slopes for sound, exposure group, and their interaction; item IDs did not overlap between experiments).

Examining Swedish (n. obs = 1380) and English (n.obs = 5760) together, with three way interaction Native Language (Swedish = 1 vs. English = -1) X Group (/d/-exposure = 1 vs. control = -1) X Sound (d = 1, t = -1) as fixed effects, and random intercepts for word and participant.

##         [,1]
## Swedish    1
## English   -1
##              [,1]
## /d/-exposure    1
## control        -1
##   [,1]
## d    1
## t   -1
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: zrating.simplified_d ~ native_language * Group * Sound + (1 +  
##     Sound * Group | MinimalPairID) + (1 + Sound | Participant)
##    Data: d.test.all.rating
## 
## REML criterion at convergence: 16543
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3423 -0.6531 -0.0198  0.6283  3.9799 
## 
## Random effects:
##  Groups        Name          Variance  Std.Dev. Corr             
##  MinimalPairID (Intercept)   0.1109761 0.33313                   
##                Sound1        0.0563861 0.23746   0.41            
##                Group1        0.0004223 0.02055  -1.00 -0.34      
##                Sound1:Group1 0.0002166 0.01472   0.24  0.98 -0.17
##  Participant   (Intercept)   0.0000000 0.00000                   
##                Sound1        0.0190064 0.13786   NaN             
##  Residual                    0.5480126 0.74028                   
## Number of obs: 7140, groups:  MinimalPairID, 90; Participant, 71
## 
## Fixed effects:
##                                  Estimate Std. Error         df t value
## (Intercept)                     3.156e-14  3.886e-02  9.295e+01   0.000
## native_language1                1.148e-14  3.886e-02  9.295e+01   0.000
## Group1                         -1.569e-15  1.134e-02  1.357e+03   0.000
## Sound1                          5.550e-01  3.368e-02  1.321e+02  16.482
## native_language1:Group1        -6.422e-16  1.134e-02  1.357e+03   0.000
## native_language1:Sound1         1.091e-01  3.368e-02  1.321e+02   3.238
## Group1:Sound1                   4.188e-03  2.078e-02  7.080e+01   0.202
## native_language1:Group1:Sound1 -3.743e-02  2.078e-02  7.080e+01  -1.801
##                                Pr(>|t|)    
## (Intercept)                     1.00000    
## native_language1                1.00000    
## Group1                          1.00000    
## Sound1                          < 2e-16 ***
## native_language1:Group1         1.00000    
## native_language1:Sound1         0.00152 ** 
## Group1:Sound1                   0.84088    
## native_language1:Group1:Sound1  0.07596 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ntv_l1 Group1 Sound1 nt_1:G1 n_1:S1 Gr1:S1
## nativ_lngg1  0.356                                           
## Group1      -0.184 -0.055                                    
## Sound1       0.311  0.104 -0.055                             
## ntv_lng1:G1 -0.055 -0.184  0.603 -0.018                      
## ntv_lng1:S1  0.104  0.311 -0.018  0.369 -0.055               
## Group1:Snd1  0.018  0.006 -0.003  0.080 -0.001   0.040       
## ntv_1:G1:S1  0.006  0.018 -0.001  0.040 -0.003   0.080  0.427
## convergence code: 0
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular

Results: there was a marginal three way interaction. Next, we check if the 3-way interaction still holds if we down-sample English data to match the number of participants in the Swedish dataset.

## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Adding missing grouping variables: `term`, `significance`
## Adding missing grouping variables: `term`, `significance`
## # A tibble: 10 x 5
## # Groups:   term, significance [6]
##    term                         significance direction     n effects       
##    <chr>                        <chr>        <chr>     <int> <chr>         
##  1 native_language1:Group1:Sou… no           -            51 3-way interac…
##  2 native_language1:Group1:Sou… no           +             6 3-way interac…
##  3 native_language1:Group1:Sou… yes          -            43 3-way interac…
##  4 native_languageEnglish:Grou… no           -             7 simple effects
##  5 native_languageEnglish:Grou… no           +            53 simple effects
##  6 native_languageEnglish:Grou… yes          +            40 simple effects
##  7 native_languageSwedish:Grou… no           -            54 simple effects
##  8 native_languageSwedish:Grou… no           +            20 simple effects
##  9 native_languageSwedish:Grou… yes          -            25 simple effects
## 10 native_languageSwedish:Grou… yes          +             1 simple effects

Compared to the English experiment, the Swedish experiment had substantially fewer participants (about 50% fewer). As described in Section XXX, the Swedish experiment also employed fewer test tokens than the English experiment (about 50% fewer). Since a benefit of /d/-exposure was found for the larger (English), but not the smaller (Swedish), data set one obvious question is whether the difference in results is due to statistical power: the null result for the Swedish data might simply reflect a Type II error.

To address this possibility, we randomly bootstrapped (with replacement) both the English and the Swedish data to the same number of participants and test tokens included in the analysis of the Swedish data (11 in the /d/-exposure group and 11 in the control group). We then repeated the analyses reported in the main text, comparing the Swedish and (down-sampled) English test data. This process was repeated 100 times.

94 out of 100 times we observed a three-way interaction in the predicted direction and 43 out of 100 times, we replicated the critical significant three-way interaction between Language, Exposure group, and Sound category. The simple effect of the two-way interaction between Exposure group and Sound was significant for the English data in 40 samples out of 100 times (all in the predicted direction), whereas it was significant for the Swedish data in 26 samples out of 100 times (25 in the opposite of the predicted direction). In addition, 98 out of 100 times, t-value of the English data’s two-way interaction is as high as the actual t-value of the Swedish data.

In short we reliably replicate the difference between the English and Swedish even when the both data sets only have 22 participants.

Recording of materials

English

Recordings were made of a male native-Mandarin speaker who was a late second language learner of English and had resided in the United States for 18 months at the time of recording. Recordings were made in a soundproof room using a microphone onto a digital recorder, digitally sampled at 44.1 kHz and normalized for root mean square amplitude to 70 dB sound pressure level.

Swedish

Recordings were made of a 25-year old, female native speaker of the Brabantish dialect of Central Flanders, with level A1 (CEFR) knowledge of Swedish at the time of recording. As the speaker was very inexperienced with Swedish and therefore unfamiliar with many of the words, recordings of the materials spoken by a female native Swedish speaker were also made which served as native exemplars for the Flemish speaker.

Recordings were made in a sound-attenuated room at the Stockholm University Multilingualism Lab. A recording of the native speaker producing the target word was first played to the Flemish speaker over Sony MDR-7506 headphones at a comfortable volume. Simultaneously and throughout the trial, the target word was displayed on a computer screen placed within a comfortable viewing distance. An audible beep was played after 2 seconds from trial onset (after the native recording had finished playing) to cue production of the target. Words were spoken into an Audio-Technica AT3035 microphone, placed directly in front of the speaker. Recordings were sampled at 44.1kHz. The experimenter controlled the presentation of each word which appeared three times in random order in order to give the speaker sufficient time and opportunity to say the words correctly. Recording samples were screened for vowel mispronunciation (e.g. mispronouncing a long vowel as short) and excluded from consideration. The word lists were divided into exposure /d/-final words, filler words, replacement words, and test words. These were recorded in separate sessions. Minimal pair test words were presented in separate lists to avoid deliberate contrastive hyper-articulation.

Why the difference?

The two experiments differ by design in the L1-L2 background of the recorded speaker. The central purpose of the Swedish experiment was to replicate the findings of Xie et al. (2017) (and Eisner et al. (2013) for another planned experiment) for another L1-L2 combination. Whether differences between native and non-native accents in the statistics of the cue distributions for /d/ and /t/ could explain the results is the purpose of the present paper (regardless of whether these differences are caused by the L1-L2 pairing, the specific speaker that was recorded, or any other aspect of the recording procedure).

The two experiments differed, however, in the recording procedure. Whereas the English recordings was elicited without playing a native pronunciation (unassisted), the Swedish recordings were elicited by first playing a native pronunciation of the target word (assisted). This decision was made because the non-native speaker was still in the early stages of L2 acquisition (A1 CEFR). In particular, Swedish has a complex vowel system, with many vowel categories that have no counterparts in the speakers’ L1 (Flemish). Furthermore, the mapping from orthography to pronunciation is non-transparent in Swedish. As would be expected, the non-native speaker struggled with vowel pronunciation. After an initial unassisted recording session, we therefore decided to re-record the Flemish speaker in the assisted condition. The perception experiment employed the recordings from this latter recording session.

Are differences in recording procedure likely to explain the results?

This is one of the specific possibilities that could drive the results of the IO models. In previous analyses, we have found that the recording procedure indeed had a strong effect on the pronunciations of the non-native speaker (Tan, Xie, & Jaeger, 2019). Specifically, we found that the category means of /d/ and /t/ in either of the two recording conditions differed significantly from the native pronunciations of the native speaker whose recordings were used in the assisted condition. However, the recordings in the unassisted condition differed significantly more from the native-accented speech, both in terms of the number of cue dimensions along which the non-native speech differed from native speech and in terms of the degree of difference (for details, see Tan et al., 2019 and section XXX).

It is therefore possible that the decision to use stimuli from the assisted recording condition caused the null effect of /d/-exposure. Indeed, this is the prediction our IO approach would make. In section XXX, we report a comparison of the IO’s predictions for the foreign-accented stimuli recorded in the unnassisted condition and those recorded in the assisted condition. This comparison predicts that a perception experiment with the recordings from the unassisted condition would be substantially more likely to elicit a benefit of /d/-exposure (see Section XXX).

Are potential differences in recording quality likely to explain the results?

The two recording equipment and environments were similar across the two experiments. Care was taken in both experiments to elicit recording free of noticable background noise. All three cues (vowel, closure, and burst duration) are durational and therefore unlikely to suffer from minor differences in recording quality. The materials for both experiments are available via OSF.

Are cues not considered in the analyses likely to explain the results?

The three cues analyzed in both experiments are considered primary cues to syllable-final voicing (REFS). It is therefore likely, but not guaranteed, that the three cues would explain a substantial part of variation in participants’ ratings during test. The results of the ideal observer analyses support this assumption: performance of the non-native IO was high for both experiments, both on the exposure data (English /d/-exposure: XXX%; Swedish/d/-exposure: XXX%) and one the test data (English: XXX%; Swedish: XXX%). This suggests that /d/ and /t/ in both the English and the Swedish data formed clusters that were separable within the 3D space defined by the three cues.

The predictions of IOs matching the exposure condition (e.g, the native IO for the control group) achieved positive correlations with human ratings (English: \(r^2 =\) .XXX; Swedish: \(r^2 =\) .XXX). This suggests that a substantial amount of variance participants’ ratings could be described by the simple 0-DF linking hypothesis between the three cues and ratings (Bayes theorem & Luce’s choice rule). The IOs fit to the Swedish data did not fail to predict participants’ performance. Rather, like the human participants, the IOs fail to find a benefit of /d/-exposure.

Exposure stimuli and procedure

During exposure, participants performed a lexical decision task. Recordings were played over headphones at a comfortable volume. Participants were instructed to decide whether the word they heard was a real word or not. Order of presentation was randomized across participans.

The full list of stimuli is available in section XXX for both Swedish and English. In both experiments participants in both groups heard a total of 180 words, including the same 60 filler words in the respective languages and 90 pseudowords that obeyed English or Swedish phonotactical rules. The remaining 30 words were the critical words, manipulated between exposure groups.

English

The /d/-exposure group heard 30 critical words ending with /d/, and without /t/-final minimal pair neighbors (e.g. ). The 30 replacement words for the control group (e.g., animal) were matched to the critical /d/-words in syllable length and mean lemma frequency (based on CELEX, Baayen, Piepenbrock, & Gulikers, 1995).

All words or nonwords were multisyllabic and contained three to four syllables. Other than the critical /d/-final words, no other alveolar stops, voiced stops, dental fricatives, and postalveolar affricates occured. The voiceless stops (/k/ and /p/) did not appear in word-final position.

Swedish

The /d/-exposure group heard 30 critical words ending with /d/, and without /t/-final minimal pair neighbors (e.g. ). The 30 replacement words for the control group did not contain /d, t, b, g/, and were matched in syllable length and average base form frequency (based on a 25 million word database from selected corpora from Språkbanken, https://spraakbanken.gu.se/korp, accessed January 2019). .

All words or pseudowords were multisyllabic and contained two to five syllables. Other than the /d/-final critical words, all stimuli were chosen to avoid voiced stops as well as /t/ in any position (i.e., no /d, t, b, g/). The other two voiceless stops (/k/ and /p/) were kept at a minimum but not fully avoidable in order to have a sufficent number of exposure words. Overall, there were 28 occurrences of /k/ or /p/ in the exposure words for both groups (14 each), and 1 occurrence in the filler words.

Unintended by the design, one critical word in the /d/-exposure group contained a word-medial syllable-initial /d/ (medellivslangd—‘average life span’), one pseudoword contained word-medial syllable-initial /d/ (mörvinder—‘meaningless’), and one pseudoword contained word-final /t/ (spållrivet—‘meaningless’).

Why the difference?

Other than the three unintended occurences of /d,t/, differences between the English and Swedish exposure stimuli were a result of the constraints imposed by the two languages and the goal to balance word frequency across the two exposure groups. Under those constraints, complete avoidance of all stops other than syllable-final /d/ in the /d/-exposure group was not possible.

Are these differences likely to explain the results?

The occurence of word-medial syllable-initial /d/ and word-final /t/ in the pseudoword list are unlikely to explain the null effect for the Swedish data. First off, participants reliably categorized these words as non-words (M = XXX, SD = XXX). This also means that these two exposures were not lexically labeled, meaning that participants did not get information as to whether the tokens they heard were meant to be pronunciations of /d/ or /t/. Even if participants gathered information from these pronunciations, it is unlikely that the two tokens alone (heard by both groups) led to so much adaptation that no further difference between exposure group could be detected.

Finally, the word-medial /d/ in medellivslängd was only heard by the /d/-exposure group and thus cannot not have affected adaptation in the control group. It is also unlikely that the occurrence of syllable-initial /d/ strongly affected adaptation in the the /d/-exposure group. On the one hand, only syllable-final stops are devoiced in Flemish, so that the syllable-initial /d/ pronunciation of the Flemish-accented speaker is unlikely to deviate as strongly from native Swedish pronunciation. But participants in the /d/-exposure group heard 30 words with syllable-final /d/, all of which deviated from native pronunciation. Even if both the two pseudowords and the word-medial /d/ in medellivslängd somehow affected participants’ ratings during test, we would expect that the 30 critical exposure words with syllable-final /d/ would affect participants’ ratings over and above that effect.

Test stimuli and procedure

The test phase followed immediately after the exposure phase. In both experiments, the test phases started with two blocks in which participants rated recordings of /d,t/-final minimal pair words. Specifically, participants were asked to rate the final sound of the words on a 1-7 scale for how good an example that sound was for the named category (either /d/ or /t/), with 1 being the worst rating and 7 being the best. In one of these two test blocks, participants rated /d/-goodness. In the other test block, they rated /t/-goodness. The two words of a minimal pair never occurred within the same block. The order of the two blocks was counter-balanced across participants. Within each block, order of presentation was randomized across participans.

Since neither the present nor previous work found interaction of the rating question (whether participants were asked to rate /d/ or /t/-goodness) with any of the predictors of interest, we transformed /t/-goodness ratings into /d/-goodness ratings by subtracting /t/-goodness ratings from 8.

English

The test stimuli included 60 monosyllabic minimal pairs ending in /d/ or /t/ (e.g., seed–seat). Other than the final stops, the same restrictions on sounds as in the exposure words were applied here.

Swedish

The Swedish experiment employed fewer test stimuli (64 /d-/t/-final words from 32 minimal pairs, e.g., röd—‘red’, comm. gen. and röt—‘shout’, pret.). Additionally, test words were allowed to have word-initial voiced stops (other than /d/, e.g. bädd-bett).

Two pairs, (vård-vårt and hård-hårt) were excluded from analysis as it was discovered that dental stops preceded by /r/ in Swedish are pronounced differently as retroflexed variations of /d,t/ and therefore would not have been consistent with the rest of the set.

Due to a misunderstanding, the Swedish data did not counter-balance whether a word recording was rated for /d/- or for /t/-goodness (e.g., the word rid—‘ride’ was always rated for /d/-goodness; the word rit —‘rite’ was always rated for /t/-goodness; across minimal pair words, /d/- and /t/-final words were equally often rated for /d/- and /t/-goodness).

Finally, the Swedish experiment contained a third and fourth test block during which participants rated 54 words from 27 /d/-/t/-initial minimal pairs (e.g., dom and tom).

The order of these later test blocks was counter-balanced across participants but participants always completed the test block with word-final ratings before performing the word-initial ratings. The data from word-initial ratings tasks are not part of the current analysis, though we note that exposure group did not affect the Swedish word-initial ratings either.

Why these differences?

The Swedish experiment employed fewer test tokens because it was difficult to find additional minimal pairs for Swedish. While there are many /d/-/t/-final minimal pairs in Swedish, a large number of them share the same stem (e.g., hård—‘hard’, comm. gen., adverb’ - hårt—‘hard’, neut., adj.). As the materials designed for this pilot was done in conjunction with the planning of a similar study involving priming, materials of this type would have been unsuitable and therefore excluded. The limited options also motivated the decision to allow test stimuli in which voiced stops (other than /d/) occurred at the word onset.

The additional test blocks with /d-/t/-initial minimal pair words were included in the Swedish experiment in order to investigate the question of whether exposure to devoiced /d/-final words might result in higher goodness ratings for /d-/t/-initial words (building on Eisner et al., 2013).

Are these differences likely to explain the results?

Compared to the English experiment, the Swedish experiment had about 50% fewer items. The resulting reduction in power could explain the null effect for Swedish. This possibility was ruled out by bootstrap analyses reported in Section XXX.

It is unclear how the other differences in the materials, or the inclusion of additional (later) test blocks would explain the Swedish data.

Acoustic analysis

Swedish

Tokens were annotated for their duration of vowel, closure, and burst. Annotations were completed in Praat (Boersma, 2001) using visual examination of spectrograms, and listening judgments. Cue boundaries were marked following conventions (Flege, Munro, & Skelton, 1992). Vowel duration was measured from the beginning of the first periodic portion of each waveform to the zero-crossing where the amplitude decreased abruptly and the waveform became sinusoidal. Burst was measured from stop release to the first zero crossing point where the amplitude became near zero. Closure was measured as the time between vowel offset and burst onset (for stops following nasals, closure onset was marked by an abrupt decline in amplitude of the nasal).

Description of Ideal Observer Framework

Data preparation

  1. Extraction of acoustic cues. We obtained the measurements of three durational cues that are cruciually for coda voicing: the preceding vowel, closure interval and burst release.
  2. Cue measurement Correction. The purpose of correation (or residualization) is to control for phonological context effects, without losing the difference between accents within each language. This procedure was conducted because a) Exposure words contain only /d/ words, but test words have /d/-/t/ minimal pairs; b) The presence of phonological contexts (e.g., l, r, nasals) varies between exposure and test words; c) Exposure words (multisyllabic) are longer than test words (monosyllabic). For these reasons, we do the following for residualization.
  • STEP 1: y ~ 1 + (phon_ctxt_vl + phon_ctxt_r + phon_ctxt_n), data = test /d/ and /t/ words only – here we get the predicted effects of phonological contexts, and apply that to exposure words
  • STEP 2: y ~ 1 + syllable, data = exposure /d/ and test /d/ words only – here we get the predicted effects of syllable length, and apply that to the test /t/ words
  • STEP 3: then take the between-talker difference (Native-Non-native) for t category, and add it to the cue values of all tokens of the non-native talker – here the aim is to make sure /t/-category is entirely aligned between the native and non-native speakers.

Visualize test tokens in 3d space before and after cue correction (residualization) procedure.

Structure of Ideal Observers

Within each language, we compare the predictions of two ideal observers: a native model and a non-native model. The only difference between native and non-native models lies in the /d/ tokens used for training. Since exposure items and test items are difference, there is no overlapping between the training and test data used for the Ideal Observers.

  • Native model:
  • Exposure /d/ comes from the native talker’s exposure /d/ (30 tokens)
  • Exposure /t/ comes from the native talker’s test /t/ (downsampled to 30 tokens)
  • Test /d/ and /t/ comes from the non-native talker’s test /d/ and /t/
  • Non-native model:
  • Exposure /d/ comes from the non-native talker’s exposure /d/ (30 tokens)
  • Exposure /t/ comes from the native talker’s test /t/ (downsampled to 30 tokens)
  • Test /d/ and /t/ comes from the non-native talker’s test /d/ and /t/

Visualize Ideal Observer predictions

Plot bar graphs.

Extract model parameters for the Swedish dataset.

Plot results for the Swedish dataset in 3d space.

## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Extract model parameters for the English dataset.

## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Plot the differences between the native and non-native models.

Do Ideal Observers predict the human responses?

We expect to see a match between the Ideal Observer predictions and human responses in two ways.

First, at the group level, we hope to predict human responses from the two groups (/d/-exposure vs. control) match the two corresponding models (non-native model vs. native model), respectively.

  • human resp (d-goodness) ~ model prediction (d-goodness) X intended category X consistence

Second, at the item level, we hope to predict human responses from the posterior of each token. Ideally, we expect to see a consistent correlation between IO predictions and human data across the two categories (/d/ and /t/). This ideal situation is based on the following assumptions:

  1. we have enough participant data.
  2. models use all cues that human have access to – likely to lead to lines shifted apart (assuming also the other assumptions are orthogonal to this particular assumption.
  3. the space in which we assess performance are the same as human participant use.
  4. cues are multi-Gaussian distributions.
  5. the link function between the IO prediction and human response is correct.

Group-level performance

Is there consistence between human experimental conditions and ideal observers?

We compare the transformed /d/ goodness ratings from rating tasks and the posterior of /d/ category from ideal observers. There is a clear difference between the two experiments, but in both cases, the model predictions match the patterns of human ratings. + For the Swedish data, the Ideal Observers show no difference between the native model and the non-native model; mirroring this, the human data show no difference between the /d/-exposure and the control groups. + For the English data, the Ideal Observers predict significantly bettter performance from the non-native model over the native model for both /d/ and /t/; this pattern was also reflected in the higher ratings for the intended category (for both /d/ and /t/) in the human ratings.

Item-level performance

  1. Are there correlations between human ratings and ideal observers predictions?
  2. Are the correlations stronger for the consistent pairing (consistence coded as “yes”: /d/-exposure and non-native model; control and native model)?

Run mixed-effects model to see if human responses (rated /d/ goodness) are predicted by the model predictions (some function of posterior of /d/) and we expect to see an interaction between consistence (/d/-exposure matching non-native model; control matching native model) and model predicted /d/ goodness.

  • Note: for now we do not yet have a linking function from the posterior of /d/ to the human /d/ goodness ratings. We check the posterior of /d/ and its logodds to see if they are related to human /d/ goodness ratings.
##         [,1]
## Swedish    1
## English   -1
##              [,1]
## /d/-exposure    1
## control        -1
##   [,1]
## d    1
## t   -1
##            [,1]
## non-native    1
## native       -1
##     [,1]
## yes    1
## no    -1
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## zrating.simplified_d ~ native_language * posterior.c * consistence *  
##     Sound + (1 | MinimalPairID) + (1 | Participant)
##    Data: 
## d.test.all %>% filter(model == "d") %>% mutate(posterior.c = scale(posterior,  
##     center = T, scale = F))
## 
## REML criterion at convergence: 34057.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4920 -0.7156 -0.0029  0.6846  4.2710 
## 
## Random effects:
##  Groups        Name        Variance  Std.Dev. 
##  MinimalPairID (Intercept) 9.089e-02 3.015e-01
##  Participant   (Intercept) 6.231e-33 7.893e-17
##  Residual                  6.200e-01 7.874e-01
## Number of obs: 14280, groups:  MinimalPairID, 90; Participant, 71
## 
## Fixed effects:
##                                                    Estimate Std. Error
## (Intercept)                                      -1.380e-01  5.743e-02
## native_language1                                 -1.251e-01  5.743e-02
## posterior.c                                       1.438e-01  8.247e-02
## consistence1                                     -2.412e-03  3.349e-02
## Sound1                                            5.032e-01  4.032e-02
## native_language1:posterior.c                     -7.723e-02  8.247e-02
## native_language1:consistence1                     2.814e-03  3.349e-02
## posterior.c:consistence1                          6.458e-02  6.869e-02
## native_language1:Sound1                           1.068e-01  4.032e-02
## posterior.c:Sound1                                3.198e-01  9.691e-02
## consistence1:Sound1                              -1.930e-02  3.349e-02
## native_language1:posterior.c:consistence1        -1.988e-02  6.869e-02
## native_language1:posterior.c:Sound1               2.528e-01  9.691e-02
## native_language1:consistence1:Sound1             -3.402e-04  3.349e-02
## posterior.c:consistence1:Sound1                   1.068e-02  6.870e-02
## native_language1:posterior.c:consistence1:Sound1 -1.618e-02  6.870e-02
##                                                          df t value
## (Intercept)                                       6.443e+02  -2.404
## native_language1                                  6.443e+02  -2.179
## posterior.c                                       1.304e+04   1.743
## consistence1                                      1.417e+04  -0.072
## Sound1                                            1.281e+04  12.477
## native_language1:posterior.c                      1.304e+04  -0.936
## native_language1:consistence1                     1.417e+04   0.084
## posterior.c:consistence1                          1.417e+04   0.940
## native_language1:Sound1                           1.281e+04   2.649
## posterior.c:Sound1                                9.042e+03   3.300
## consistence1:Sound1                               1.417e+04  -0.576
## native_language1:posterior.c:consistence1         1.417e+04  -0.289
## native_language1:posterior.c:Sound1               9.042e+03   2.609
## native_language1:consistence1:Sound1              1.417e+04  -0.010
## posterior.c:consistence1:Sound1                   1.417e+04   0.156
## native_language1:posterior.c:consistence1:Sound1  1.417e+04  -0.235
##                                                  Pr(>|t|)    
## (Intercept)                                      0.016519 *  
## native_language1                                 0.029725 *  
## posterior.c                                      0.081299 .  
## consistence1                                     0.942583    
## Sound1                                            < 2e-16 ***
## native_language1:posterior.c                     0.349051    
## native_language1:consistence1                    0.933035    
## posterior.c:consistence1                         0.347204    
## native_language1:Sound1                          0.008084 ** 
## posterior.c:Sound1                               0.000971 ***
## consistence1:Sound1                              0.564445    
## native_language1:posterior.c:consistence1        0.772327    
## native_language1:posterior.c:Sound1              0.009099 ** 
## native_language1:consistence1:Sound1             0.991895    
## posterior.c:consistence1:Sound1                  0.876411    
## native_language1:posterior.c:consistence1:Sound1 0.813853    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 16 > 12.
## Use print(summary(prediction.lmer), correlation=TRUE)  or
##     vcov(summary(prediction.lmer))        if you need it
## convergence code: 0
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## zrating.simplified_d ~ native_language * posterior.c * consistence +  
##     (1 | MinimalPairID) + (1 + Sound | Participant)
##    Data: 
## d.test.all %>% filter(model == "d") %>% mutate(posterior.c = scale(posterior,  
##     center = T, scale = F))
## 
## REML criterion at convergence: 33890.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5169 -0.7086  0.0245  0.6799  4.4052 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev. Corr
##  MinimalPairID (Intercept) 0.1045   0.3233       
##  Participant   (Intercept) 0.0000   0.0000       
##                Sound1      0.2358   0.4856    NaN
##  Residual                  0.6007   0.7750       
## Number of obs: 14280, groups:  MinimalPairID, 90; Participant, 71
## 
## Fixed effects:
##                                             Estimate Std. Error         df
## (Intercept)                               -3.200e-03  3.715e-02  9.119e+01
## native_language1                          -5.649e-03  3.715e-02  9.119e+01
## posterior.c                                2.159e-01  5.860e-02  1.007e+03
## consistence1                              -1.152e-05  8.242e-03  1.409e+04
## native_language1:posterior.c              -1.550e-02  5.860e-02  1.007e+03
## native_language1:consistence1              2.781e-05  8.242e-03  1.409e+04
## posterior.c:consistence1                  -2.388e-04  1.939e-02  1.412e+04
## native_language1:posterior.c:consistence1  3.477e-03  1.939e-02  1.412e+04
##                                           t value Pr(>|t|)    
## (Intercept)                                -0.086 0.931541    
## native_language1                           -0.152 0.879478    
## posterior.c                                 3.684 0.000242 ***
## consistence1                               -0.001 0.998885    
## native_language1:posterior.c               -0.264 0.791523    
## native_language1:consistence1               0.003 0.997308    
## posterior.c:consistence1                   -0.012 0.990173    
## native_language1:posterior.c:consistence1   0.179 0.857694    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ntv_l1 pstrr. cnsst1 nt_1:. nt_1:1 pst.:1
## nativ_lngg1  0.350                                          
## posterior.c -0.063 -0.065                                   
## consistenc1  0.000  0.000  0.005                            
## ntv_lngg1:. -0.065 -0.063  0.837  0.005                     
## ntv_lngg1:1  0.000  0.000  0.005  0.616  0.005              
## pstrr.c:cn1  0.000  0.000  0.000 -0.058  0.000 -0.076       
## ntv_ln1:.:1  0.000  0.000  0.000 -0.076  0.000 -0.058  0.289
## convergence code: 0
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## zrating.simplified_d ~ native_language/posterior.c * consistence +  
##     (1 | MinimalPairID) + (1 + Sound | Participant)
##    Data: 
## d.test.all %>% filter(model == "d") %>% mutate(posterior.c = scale(posterior,  
##     center = T, scale = F))
## 
## REML criterion at convergence: 33888
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5169 -0.7086  0.0245  0.6799  4.4052 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev. Corr
##  MinimalPairID (Intercept) 0.1045   0.3233       
##  Participant   (Intercept) 0.0000   0.0000       
##                Sound1      0.2358   0.4856    NaN
##  Residual                  0.6007   0.7750       
## Number of obs: 14280, groups:  MinimalPairID, 90; Participant, 71
## 
## Fixed effects:
##                                                   Estimate Std. Error
## (Intercept)                                     -3.200e-03  3.715e-02
## native_language1                                -5.649e-03  3.715e-02
## consistence1                                    -1.152e-05  8.242e-03
## native_languageSwedish:posterior.c               2.004e-01  1.123e-01
## native_languageEnglish:posterior.c               2.314e-01  3.346e-02
## native_language1:consistence1                    2.781e-05  8.242e-03
## native_languageSwedish:posterior.c:consistence1  3.238e-03  3.114e-02
## native_languageEnglish:posterior.c:consistence1 -3.716e-03  2.312e-02
##                                                         df t value
## (Intercept)                                      9.119e+01  -0.086
## native_language1                                 9.119e+01  -0.152
## consistence1                                     1.409e+04  -0.001
## native_languageSwedish:posterior.c               8.632e+02   1.784
## native_languageEnglish:posterior.c               1.401e+04   6.915
## native_language1:consistence1                    1.409e+04   0.003
## native_languageSwedish:posterior.c:consistence1  1.409e+04   0.104
## native_languageEnglish:posterior.c:consistence1  1.415e+04  -0.161
##                                                 Pr(>|t|)    
## (Intercept)                                       0.9315    
## native_language1                                  0.8795    
## consistence1                                      0.9989    
## native_languageSwedish:posterior.c                0.0748 .  
## native_languageEnglish:posterior.c               4.9e-12 ***
## native_language1:consistence1                     0.9973    
## native_languageSwedish:posterior.c:consistence1   0.9172    
## native_languageEnglish:posterior.c:consistence1   0.8723    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ntv_l1 cnsst1 nt_S:. nt_E:. nt_1:1 n_S:.:
## nativ_lngg1  0.350                                          
## consistenc1  0.000  0.000                                   
## ntv_lnggS:. -0.067 -0.067  0.005                            
## ntv_lnggE:.  0.005 -0.005  0.000  0.000                     
## ntv_lngg1:1  0.000  0.000  0.616  0.005  0.000              
## ntv_lnS:.:1  0.000  0.000 -0.083  0.000  0.000 -0.083       
## ntv_lnE:.:1  0.000  0.000  0.015  0.000  0.000 -0.015  0.000
## convergence code: 0
## boundary (singular) fit: see ?isSingular